Search CORE

596 research outputs found

Symmetric Spaces and Star representations II : Causal Symmetric Spaces

Author: Arnal
Bayen
Bieliavsky
Fronsdal
M. Pevzner
P. Bieliavsky
Publication venue: 'Elsevier BV'
Publication date: 08/05/2001
Field of study

We construct and identify star representations canonically associated with holonomy reducible simple symplectic symmetric spaces. This leads the a non-commutative geometric realization of the correspondence between causal symmetric spaces of Cayley type and Hermitian symmetric spaces of tube type.Comment: 13 page

arXiv.org e-Print Archive

Crossref

DI-fusion

On NP-Hardness of the Paired de Bruijn Sound Cycle Problem

Author: J. Galant
P. Medvedev
P. Medvedev
P. Medvedev
P.A. Pevzner
S.R. Mahaney
Publication venue
Publication date: 01/01/2013
Field of study

The paired de Bruijn graph is an extension of de Bruijn graph incorporating mate pair information for genome assembly proposed by Mevdedev et al. However, unlike in an ordinary de Bruijn graph, not every path or cycle in a paired de Bruijn graph will spell a string, because there is an additional soundness constraint on the path. In this paper we show that the problem of checking if there is a sound cycle in a paired de Bruijn graph is NP-hard in general case. We also explore some of its special cases, as well as a modified version where the cycle must also pass through every edge.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

arXiv.org e-Print Archive

Crossref

An Efficient Algorithm For Chinese Postman Walk on Bi-directed de Bruijn Graphs

Author: D.R. Zerbino
E.S. Lander
E.W. Myers
J. Craig Venter
P. Medvedev
P.A. Pevzner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Sequence assembly from short reads is an important problem in biology. It is known that solving the sequence assembly problem exactly on a bi-directed de Bruijn graph or a string graph is intractable. However finding a Shortest Double stranded DNA string (SDDNA) containing all the k-long words in the reads seems to be a good heuristic to get close to the original genome. This problem is equivalent to finding a cyclic Chinese Postman (CP) walk on the underlying un-weighted bi-directed de Bruijn graph built from the reads. The Chinese Postman walk Problem (CPP) is solved by reducing it to a general bi-directed flow on this graph which runs in O(|E|2 log2(|V |)) time. In this paper we show that the cyclic CPP on bi-directed graphs can be solved without reducing it to bi-directed flow. We present a ?(p(|V | + |E|) log(|V |) + (dmaxp)3) time algorithm to solve the cyclic CPP on a weighted bi-directed de Bruijn graph, where p = max{|{v|din(v) - dout(v) > 0}|, |{v|din(v) - dout(v) < 0}|} and dmax = max{|din(v) - dout(v)}. Our algorithm performs asymptotically better than the bidirected flow algorithm when the number of imbalanced nodes p is much less than the nodes in the bi-directed graph. From our experimental results on various datasets, we have noticed that the value of p/|V | lies between 0.08% and 0.13% with 95% probability

arXiv.org e-Print Archive

Crossref

SEQuel: improving the accuracy of genome assemblies

Author: Bentley
C. Boucher
Chitsaz
Compeau
Ewing
Ewing
H. Chitsaz
Idury
Kelley
Klein
Li
P. Pevzner
Pevzner
Pevzner
R. Ronen
Raghunathan
Robinson
Rodrigue
Tammi
Wheeler
Zhi
Publication venue: Oxford University Press
Publication date: 01/01/2012
Field of study

Motivation: Assemblies of next-generation sequencing (NGS) data, although accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop SEQuel, a tool that corrects errors (i.e. insertions, deletions and substitution errors) in the assembled contigs. Fundamental to the algorithm behind SEQuel is the positional de Bruijn graph, a graph structure that models k-mers within reads while incorporating the approximate positions of reads into the model

CiteSeerX

Crossref

PubMed Central

The Fibers and Range of Reduction Graphs in Ciliates

Author: A. Bergeron
A. Ehrenfeucht
Hendrik Jan Hoogeboom
J. Setubal
P. Pevzner
R. Brijder
R. Brijder
R. Brijder
Robert Brijder
S. Hannenhalli
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/02/2007
Field of study

The biological process of gene assembly has been modeled based on three types of string rewriting rules, called string pointer rules, defined on so-called legal strings. It has been shown that reduction graphs, graphs that are based on the notion of breakpoint graph in the theory of sorting by reversal, for legal strings provide valuable insights into the gene assembly process. We characterize which legal strings obtain the same reduction graph (up to isomorphism), and moreover we characterize which graphs are (isomorphic to) reduction graphs.Comment: 24 pages, 13 figure

arXiv.org e-Print Archive

Crossref

A New Simulated Annealing Algorithm for the Multiple Sequence Alignment Problem: The approach of Polymers in a Random Media

Author: A. Godzik
D. Gunsfield
J. Kim
M. Hernández-Guía
M. Ishikawa
M. S. Waterman
P. Pevzner
R. Durbin
R. Mulet
S. Geman
S. Rodríguez-Pérez
Publication venue: 'American Physical Society (APS)'
Publication date: 10/01/2005
Field of study

We proposed a probabilistic algorithm to solve the Multiple Sequence Alignment problem. The algorithm is a Simulated Annealing (SA) that exploits the representation of the Multiple Alignment between

D

sequences as a directed polymer in

D

dimensions. Within this representation we can easily track the evolution in the configuration space of the alignment through local moves of low computational cost. At variance with other probabilistic algorithms proposed to solve this problem, our approach allows for the creation and deletion of gaps without extra computational cost. The algorithm was tested aligning proteins from the kinases family. When D=3 the results are consistent with those obtained using a complete algorithm. For

D>3

where the complete algorithm fails, we show that our algorithm still converges to reasonable alignments. Moreover, we study the space of solutions obtained and show that depending on the number of sequences aligned the solutions are organized in different ways, suggesting a possible source of errors for progressive algorithms.Comment: 7 pages and 11 figure

arXiv.org e-Print Archive

Crossref

Thermodynamics of protein folding: a random matrix formulation

Author: Betancourt M R
Creighton T E
Frauenfelder H
Kleinberg J Istrail S Pevzner P Waterman M
Lee S
Mehta M L
Pragya Shukla
Richards F M
Shortle D
Shukla P
van den Berg B
Publication venue: 'IOP Publishing'
Publication date: 16/10/2010
Field of study

The process of protein folding from an unfolded state to a biologically active, folded conformation is governed by many parameters e.g the sequence of amino acids, intermolecular interactions, the solvent, temperature and chaperon molecules. Our study, based on random matrix modeling of the interactions, shows however that the evolution of the statistical measures e.g Gibbs free energy, heat capacity, entropy is single parametric. The information can explain the selection of specific folding pathways from an infinite number of possible ways as well as other folding characteristics observed in computer simulation studies.Comment: 21 Pages, no figure

arXiv.org e-Print Archive

Crossref

An Integrative Method for Accurate Comparative Genome Mapping

Author: Eduardo P. C Rocha
Firas Swidan
Michael Shmoish
Pavel Pevzner
Ron Y Pinter
Publication venue: Public Library of Science
Publication date: 01/01/2006
Field of study

We present MAGIC, an integrative and accurate method for comparative genome mapping. Our method consists of two phases: preprocessing for identifying “maximal similar segments,” and mapping for clustering and classifying these segments. MAGIC's main novelty lies in its biologically intuitive clustering approach, which aims towards both calculating reorder-free segments and identifying orthologous segments. In the process, MAGIC efficiently handles ambiguities resulting from duplications that occurred before the speciation of the considered organisms from their most recent common ancestor. We demonstrate both MAGIC's robustness and scalability: the former is asserted with respect to its initial input and with respect to its parameters' values. The latter is asserted by applying MAGIC to distantly related organisms and to large genomes. We compare MAGIC to other comparative mapping methods and provide detailed analysis of the differences between them. Our improvements allow a comprehensive study of the diversity of genetic repertoires resulting from large-scale mutations, such as indels and duplications, including explicitly transposable and phagic elements. The strength of our method is demonstrated by detailed statistics computed for each type of these large-scale mutations. MAGIC enabled us to conduct a comprehensive analysis of the different forces shaping prokaryotic genomes from different clades, and to quantify the importance of novel gene content introduced by horizontal gene transfer relative to gene duplication in bacterial genome evolution. We use these results to investigate the breakpoint distribution in several prokaryotic genomes

Public Library of Science (PLOS)

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Safe and complete contig assembly via omnitigs

Author: A Bankevich
A Guénoche
AR Rubinov
AS Motahari
C Kingsford
D Haussler
DR Zerbino
E Kapun
E Kapun
ES Lander
G Bresler
G Narzisi
I Lysov
JD Kececioglu
JR Miller
JT Simpson
JT Simpson
K Lam
K Sahlin
L Salmela
M Boetzer
M Boetzer
N Nagarajan
N Nagarajan
N Vyahhi
P Medvedev
P Medvedev
P Medvedev
PA Pevzner
PA Pevzner
R Chikhi
R Chikhi
R Luo
R Uricaru
RM Idury
SL Salzberg
Publication venue
Publication date: 16/08/2016
Field of study

Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs -- a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question was never solved: given a genome graph

G

(e.g. a de Bruijn, or a string graph), what are all the strings that can be safely reported from

G

as contigs? In this paper we finally answer this question, and also give a polynomial time algorithm to find them. Our experiments show that these strings, which we call omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.Comment: Full version of the paper in the proceedings of RECOMB 201

arXiv.org e-Print Archive

Crossref